Motivated by recent advances in Deep Learning for robot control, this paperconsiders two learning algorithms in terms of how they acquire demonstrations."Human-Centric" (HC) sampling is the standard supervised learning algorithm,where a human supervisor demonstrates the task by teleoperating the robot toprovide trajectories consisting of state-control pairs. "Robot-Centric" (RC)sampling is an increasingly popular alternative used in algorithms such asDAgger, where a human supervisor observes the robot executing a learned policyand provides corrective control labels for each state visited. RC sampling canbe challenging for human supervisors and prone to mislabeling. RC sampling canalso induce error in policy performance because it repeatedly visits areas ofthe state space that are harder to learn. Although policies learned with RCsampling can be superior to HC sampling for standard learning models such aslinear SVMs, policies learned with HC sampling may be comparable withhighly-expressive learning models such as deep learning and hyper-parametricdecision trees, which have little model error. We compare HC and RC using agrid world and a physical robot singulation task, where in the latter the inputis a binary image of a connected set of objects on a planar worksurface and thepolicy generates a motion of the gripper to separate one object from the rest.We observe in simulation that for linear SVMs, policies learned with RCoutperformed those learned with HC but that with deep models this advantagedisappears. We also find that with RC, the corrective control labels providedby humans can be highly inconsistent. We prove there exists a class of exampleswhere in the limit, HC is guaranteed to converge to an optimal policy while RCmay fail to converge.
展开▼